Gathering Metadata from Web-Based Repositories of Historical Publications

نویسندگان

  • Ismael Sanz
  • Rafael Berlanga Llavori
  • María José Aramburu Cabo
چکیده

In this paper we examine the problem of extracting schema-conforming metadata out from HTML sources. A technique founded on semistructured data analysis is explained. It is based on the combination of HTML styles, which abstract the visual characteristics of documents, and document-oriented context-free grammar, which provide structural information. This technique is flexible enough to be applied not only on individual HTML docuements, but also on hyperlinked web structures. This provides an informed, very controlled way of navigating the repositories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شناسایی روابط کتابشناختی در فهرست کتابخانه ملی ایران مبتنی بر الگوی ملزومات کارکردی پیشینه‌های کتابشناختی (اف آر بی آر): گام نخست در بازنمون شبکه دانش انتشارات ایرانی-اسلامی

The aim of this study is to find out the bibliographic relationships between the metadata records in the National Library and Archives of Iran (NLAI) according to FRBR model, in order to represent the Knowledge network of Iranian-Islamic publications. To achieve this objective, the content analysis method was used. The study population includes metadata records for books in NLAI for four biblio...

متن کامل

Metadata for Adaptive and Distributed Learning Repositories

Web-based learning is one of the important applications of the World Wide Web, which makes possible location-and time-independent learning scenarios. Content can also be kept up-to-date, discussions and interactions between instructors and learners can be supported, new materials can easily be distributed to the students. We will discuss in this talk three aspects of web-based learning material...

متن کامل

بررسی واکنش موتورهای کاوش وب به پیشینه‌های فرادا‌ده‌ای مبتنی برروش ترکیبی داده‌های خرد و روش داده‌های پیوندی

The purpose of this research was to find out the reaction of Web Search Engines to Metadata records created based on the combined method of Rich Snippets and Linked Data. 200 metadata records in two groups (100 records as the control group with the normal structure and, 100 records created based on microdata and implemented in RDF/XML as experimental group) extracted from the information gatewa...

متن کامل

Modeling, Exploring and Recommending Music in Its Complexity

Knowledge models that are currently in-use for describing music metadata are insufficient to express the wealth of complex information about creative works, expressions, performances, publications, authors and performers. In this research, we aim to propose a method for structuring the classical music information coming from different heterogeneous librarian repositories. In particular, we rese...

متن کامل

Managing metadata in open learning repositories and P2P networks

“Now, miraculously, we have the Web. For the documents in our lives, everything is simple and smooth. But for data, we are still pre-Web.”(Tim Berners-Lee, Business Model for the Semantic Web) The successful use and re-use, search, and operation of data, depends on the effective definition, use and management of metadata. The first part of this thesis considers the issues related to learning me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998